[BugFix] Overhaul async request cancellation #7111

njhill · 2024-08-03T15:03:00Z

There are a number of problems currently with how request cancellation works upon client disconnection in the openai api server front-end and AsyncLLMEngine:

Abort logic is missing for streaming chat requests - these continue to run after the client disconnects
For multi-prompt completion requests, only one of the n running sequences actually gets aborted
Cancelled requests that are queued will still be pre-filled before getting aborted
Some of the existing async generator logic may be contributing to clean-shutdown garbage collection issues

This is a problem for production resilience and has a compounding affect when the server is overloaded and client requests time-out, with the server continuing to do useless work.

This PR reworks how the cancellation is propagated to make it more robust and consistent:

Use native asyncio task/generator cancellation propagation as much as possible. This includes making use of explicitly cancellable async generators (via asyncio.aclose()) rather than async iterators in most cases
Re-implement merge_async_iterators to encapsulate polling for disconnection, even before any results have been produced (while the request is queued)
Add equivalent iterate_with_cancellation function used for the single-prompt request cases
In AsyncLLMEngine differentiate between cancelled requests and those that finish normally (generator returned from the generate() will now raise a CancelledError in the former case)
Don't abort requests that have already completed normally. This avoids some redundant work in the engine to look up sequence groups that have already gone

I also plan to add some new tests to cover these various cases.

github-actions · 2024-08-03T15:03:12Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-neuralmagic · 2024-08-03T17:17:13Z

Nice

…borts # Conflicts: # vllm/entrypoints/api_server.py

DarkLight1337

LGTM, sorry for keeping you waiting.

vllm-project#7111 made a change to the merge_async_iterators utils function to add an is_cancelled arg. It would be good for this new arg to be optional to retain backwards compatibility for other server front-ends that might already be using this utility function.

Follow-on from vllm-project#7111, avoid unecessary enqueuing a final message after an exception and avoid aborting requests in the engine that were never started.

[BugFix] Overhaul async request cancellation

105e3c3

Fixes

01e6683

njhill force-pushed the rework-aborts branch from a8d7855 to 01e6683 Compare August 3, 2024 17:05

njhill mentioned this pull request Aug 3, 2024

[Frontend] Add abort_request endpoint to OpenAI API server #7071

Open

njhill added 4 commits August 3, 2024 12:03

More fixes

51914c3

Only abort cancelled requests, not those that finished normally

2bda0a1

Fix merge_async_iterators cancellation ordering

cd03617

Merge remote-tracking branch 'refs/remotes/origin/main' into rework-a…

0b2bb0a

…borts # Conflicts: # vllm/entrypoints/api_server.py

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 5, 2024

DarkLight1337 requested a review from robertgshaw2-neuralmagic August 5, 2024 07:10

Adjust cancellable iterator util methods

90b6c31

njhill mentioned this pull request Aug 6, 2024

[Core] Shut down aDAG workers with clean async llm engine exit #7224

Merged

DarkLight1337 approved these changes Aug 7, 2024

View reviewed changes

DarkLight1337 merged commit 9a3f49a into vllm-project:main Aug 7, 2024
51 checks passed

njhill deleted the rework-aborts branch August 7, 2024 13:07

njhill mentioned this pull request Aug 7, 2024

[FrontEnd] Make merge_async_iterators is_cancelled arg optional #7282

Merged

njhill mentioned this pull request Aug 9, 2024

[Core] Streamline stream termination in AsyncLLMEngine #7336

Merged

sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024

[BugFix] Overhaul async request cancellation (vllm-project#7111)

23c5d56

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[BugFix] Overhaul async request cancellation (vllm-project#7111)

23fb40b

njhill mentioned this pull request Aug 21, 2024

[BugFix] Avoid premature async generator exit and raise all exception variations #7698

Merged

jonzhep mentioned this pull request Aug 21, 2024

[Bug]: Streaming API: Abort functionality not working as expected #7671

Closed

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[BugFix] Overhaul async request cancellation (vllm-project#7111)

f2ed2e3

rafvasq mentioned this pull request Sep 10, 2024

[Usage]: How to stop VLLM during generation ? #8332

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Overhaul async request cancellation #7111

[BugFix] Overhaul async request cancellation #7111

njhill commented Aug 3, 2024 •

edited

Loading

github-actions bot commented Aug 3, 2024

robertgshaw2-neuralmagic commented Aug 3, 2024

DarkLight1337 left a comment

[BugFix] Overhaul async request cancellation #7111

[BugFix] Overhaul async request cancellation #7111

Conversation

njhill commented Aug 3, 2024 • edited Loading

github-actions bot commented Aug 3, 2024

robertgshaw2-neuralmagic commented Aug 3, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

njhill commented Aug 3, 2024 •

edited

Loading